Supplementary File 2 Graphics Processing Units and CUDA

نویسندگان

  • M. S. Nobile
  • P. Cazzaniga
  • A. Tangherloni
  • D. Besozzi
چکیده

In the context of high-performance computing (HPC), the traditional solutions for distributed architectures are represented by computer clusters and grid computing [10]. The former exploits a set of inter-connected computers controlled by a centralized scheduler, while the latter consists in the logical organization of a set of geographically distributed (heterogeneous) computing resources. In both cases, the overall computational task is partitioned into smaller sub-tasks, which are assigned to the various computing units for parallel or distributed computation. These infrastructures are particularly appealing because they usually require minimal changes to the existing source code of some given program: as a matter of fact, the computing units are generally based on classic architectures (e.g., the x86 instruction set, typical of personal computers), so that the code can be easily ported, with the exception of possible modifications required for message passing. Moreover, both architectures support the Multiple Instruction Multiple Data (MIMD) execution paradigm, that is, all computing units are independent, asynchronous, can work on different data and execute different code. Despite these advantages, computer clusters and grid computing have considerable drawbacks. On the one hand, computer clusters are expensive, require maintenance and are characterized by relevant energy consumption. On the other hand, grid computing [8] is generally based on volunteering, whereby computer owners donate resources (e.g., computing power, storage) to a specific project [2, 3]. Several factors may further affect grid computing, notably the fact that remote computers might not be completely trustworthy: potentially unpublished data are transmitted to unknown remote clients for processing, and returned results might be intentionally erroneous or misleading. Moreover, there is no general guarantee about the availability of remote computers, so that some allocated tasks could never be processed. A third way to distributed computation is the emergent field of cloud computing, in which private companies offer a pool of computation resources (e.g., computers, storage) attainable on-demand and ubiquitously over the Internet. Although cloud computing mitigates some problems of classic distributed architectures—like the costs of the infrastructure and its maintenance—it is characterized by other problems, mainly the fact that data are stored on servers owned by private companies. This brings about issues of privacy, potential piracy, espionage, continuity of the service (e.g., due to some malfunctioning, DDoS attacks, or Internet connection problems), international legal conflicts, data lock-in, along with typical problems of Big Data, e.g., transferring terabyte-scale data to and from the cloud [4]. In the latter years, a completely different approach to HPC gained ground: the use of general-purpose multi-core devices like Many Integrated Cores (MIC) co-processors [23] and Graphics Processing Units (GPUs) [18]. Noteworthy, both types of devices can be installed on common consumer computers and are characterized by a large number of computing cores (up to 61 for MICs and 5760 for GPUs, at the time of writing). MICs are characterized by cores based on the x86 instruction set, extended with 512-bit vectorial instructions, inter-connected by means of a ring bus. Thanks to this architectural choice, any existing code developed for Central Processing Units (CPUs) should be easily ported to the MIC architecture. In addition, MICs offer two main programming models: the native model, which allows the execution of the source code directly on the MIC, exploiting the multiple cores for parallel execution, and the heterogeneous offload model, which allows to use simple compiler directives to designate the code sections that are executed on the MIC, while the rest of the code runs on the CPU. Differently from MICs, GPUs are pervasive, relatively cheap and extremely efficient parallel multi-core

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Simulation of a Lead-Acid Battery Discharge Process using a Developed Framework on Graphic Processing Units

In the present work, a framework is developed for implementation of finite difference schemes on Graphic Processing Units (GPU). The framework is developed using the CUDA language and C++ template meta-programming techniques. The framework is also applicable for other numerical methods which can be represented similar to finite difference schemes such as finite volume methods on structured grid...

متن کامل

Graphics processing units and genetic programming: an overview

A top end graphics card (GPU) plus a suitable SIMD interpreter, can deliver a several hundred fold speed up, yet cost less than the computer holding it. We give highlights of AI and computational intelligence applications in the new field of general purpose computing on graphics hardware (GPGPU). In particular we survey genetic programming (GP) use with GPU. We give several applications from Bi...

متن کامل

Processing Large-scale XML Files on GPGPU Cluster

XML has been used as a textual data format for transporting and storing information in many areas. However, the cost to process the large-scale XML file will become a serious issue for general processing methods. In this paper, we propose a design and implementation of a large-scale XML processing system on GPU cluster to address the processing performance issue. This system cooperates CPU and ...

متن کامل

Hierarchical Clustering with CUDA/GPU

Graphics processing units (GPUs) are powerful computational devices tailored towards the needs of the 3-D gaming industry for high-performance, real-time graphics engines. Nvidia Corporation provides a programming language called CUDA for general-purpose GPU programming. Hierarchical clustering is a common method used to determine clusters of similar data points in multidimensional spaces; if t...

متن کامل

GPGPU Processing in CUDA Architecture

The future of computation is the Graphical Processing Unit, i.e. the GPU. The promise that the graphics cards have shown in the field of image processing and accelerated rendering of 3D scenes, and the computational capability that these GPUs possess, they are developing into great parallel computing units. It is quite simple to program a graphics processor to perform general parallel tasks. Bu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016